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C^l Abstract 

Maintaining high quahty content is one of the foremost objectives of any web-based 

collaborative service that depends on a large number of users. In such systems, it is 

nearly impossible for automated scripts to judge semantics as it is to expect all editors 

to review the content. This catalyzes the need for trust-based mechanisms to ensure 

quality of an article immediately after an edit. In this paper, we build on previous work 

^— I and develop a framework based on the 'web of trust' concept to calculate satisfaction 

scores for all users without the need for perusing the article. We derive some bounds for 

systems based on our mechanism and show that the optimization problem of selecting 

the best users to review an article is NP-Hard. Extensive simulations validate our 

Q model and results, and show that trust-based mechanisms are essential to improve 

efficiency in any online collaborative editing platform. 

T-H Keywords: Social networks, trust, collaborative work, performance 
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oo 1 Introduction 

CN The emergence and evolution of the world wide web has shifted focus towards services that 

^ follow a dynamic and interactive paradigm [TJl |23]. Perhaps, the most prominent among 

T— I these are collaborative platforms that have revolutionized the physics of content generation 

^ and communication. These include, but are not limited to wikis, weblogs, versioning software 

and real-time document editing suites. Services like Wikia allow just about anyone to 
^ create a new wiki on any desired theme and indeed, wikis exist on topics from Vintage 

^ Sewing to Star Wars p]. 

Unfortunately, the open-to-edit nature of these systems has led to serious quality breaches 
in the past |2], deterring their use in academic and professional contexts [36j. Ensuring that 
the content conforms to minimum quality standards is essential to enable the widespread 
adoption of these platforms. However taking into consideration the dynamic nature and 
large user base of such communities, it is not practical to expect every user to review or edit 
the document before labeling it as 'reliable'. At the same time, different users have different 
satisfiabilities and expectations and it is not a trivial task to translate the satisfaction level 
of one user to the rest of the community. Thus, an important challenge is to quantify 
content quality and accuracy in terms of the satisfaction levels of different collaborators and 
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determine when the document is ready for pubhcation while using minimal human effort. 
One metric that expedites this process is trust. 

Recent work has focused on exploiting the underlying social network structure in online 
communities. This has led to the development of trust-based mechanisms for recommen- 
dation systems [HI ESI SO], peer-to-peer networks Internet transactions [21] and other 
general web-based networks [20] ■ A common approach is to use a trust propagation model 
similar to the one propounded in [22] to discern user preferences, i.e. for a given item, based 
on the feedback of users who have already tried it, determine whether or not to recommend 
it to an arbitrary user. 

The problem of estimating satisfaction in networks is also similar to that of information 
propagation/diffusion, which has been the subject of a rich literature [21]. Kempe et al [2S] 
considered the problem of selecting the most influential nodes in a graph to market an 
item to, in order to ensure its maximal spread and showed that the optimization version 
of this problem is NP-Hard. These works, however do not answer a few critical questions 
pertinent to collaborative document-editing: how to determine when the document is ready 
for publishing and if not, how to choose the next person to edit or estimate its quality. 

In [16], the authors introduce the notion of trust for collaborative work. They define a 
measure known as satisfaction score, which is the estimated satisfaction or rating of a user 
at the current stage of document development, when the user has not yet read the article. 
Their model assumes that every user i has a trust value for every other user j {tij G [0, 1]), 
apart from a unique threshold {hi G [0, 1]). If the satisfaction score of a user is below his 
threshold, then he is considered to be 'unsatisfied'. An editor j upon reviewing the article 
gives it an unbiased rating Cj and the satisfaction score of an arbitrary user i is calculated 
as follows: 

Si = Cj X tij (1) 

Various mechanisms were proposed to select the successive editor. Document editing ends 
when all users are satisfied with the article (i.e. Sj > bi,Wi). 

In this paper, we propose a generalized mechanism to calculate satisfaction scores for users 
in arbitrary social networks, and use this to select the successive reviewers for an article. We 
use the term 'web of trust' to denote social networks where users trust only those nodes with 
whom they have interacted previously, and this trust is quantifiable. The situation of interest 
here would be one where a new article has to be reviewed so that sufficient quality approvals 
(in terms of satisfaction among all users) can be reached. As regards the computation of 
satisfaction scores, the model proposed in [16| emerges as a special case of our model when 
the underlying graph is a complete graph. We later show that the social network structure 
can be explicit (in terms of contacts and followers) or implicitly derived based on similarity 
of opinion and content. 

In order to analyze the efficiency gained by using our model, we resort to random graphs 
where each edge has a non-zero probability p of existing. Random graphs, first proposed by 
Erdos and Renyi [T7j, have found extensive applications in the past especially in the case of 
epidemic propagation [T21[16]. We derive some properties of our model in random graphs and 
use them to show a bound on the number of users required to review an article, in the case 
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when it is of acceptable quality. Although graph models based on the Small World [IT] and 
Power law [8J distributions are more realistic representations of social networks, Erdos-Renyi 
graphs allow us to gain a better understanding of the parameters governing the satisfaction 
distribution while allowing some analytical reasoning. Our main objective here is to use 
trust to minimize the human evaluation effort required to be expended at each stage of the 
document development. It is pertinent to remark here that we are concerned only with the 
trust between pairs of users rather than global trust better known as reputation [32]. This 
has been done giving due consideration to the subjective nature of content on the Internet 
as not all collaborative platforms have factual accuracy as their sole aim. For instance, 
interpretations of most works of fiction are highly subjective. 

The remainder of the paper is organized as follows. In Sections |2] and |3} we define the 
idea of trust for a collaborative platform and propose a model based on trust to calculate the 
satisfaction score of each user with respect to a single article. We also prove uniqueness and 
existence of satisfaction scores and present an efficient algorithm to calculate the scores using 
trust matrices. In Section |4] we consider the random graph model, and prove bounds on the 
minimum number of users required to review the article based on the expected satisfaction 
score. We prove that the optimization problem of selecting the best users to review the 
article is NP-Hard in Section [5} and present an improvised greedy algorithm to choose the 
best potential raters. Finally, in Section |6] we validate our results and observations via 
simulation and discuss future work in Section [7l 

1.1 Additional Related work 

The concept of trust in computation has been inspired by every day human relationships and 
has found applications in numerous fields including medical information [11], mobile [13] and 
pervasive computing ^37j, and security pUj. However, the growth of the world wide web has 
resulted in an alarmingly large number of transactions between complete strangers and it has 
become imperative to utilise feedback and transactional history to develop trust or reputation 
based models, the most prominent being the ones on Epinions [28] and eBay [33] . Nowadays, 
personalized content is almost ubiquitous on the web in the form of advertisements, search 
results, movie recommendations (See Netfiix $1 million challenge [2Zj), interesting links, 
etc. Many of these systems consider only binary product opinion (like/dislike) and thus the 
results obtained are not valid when ratings vary across a continuous scale. Recommendation 
systems are also not conservative, in the sense that their motive is to exactly predict a non- 
rater's vote preference for a given item as opposed to being cautious in the case of satisfaction 
scores. 

Our work is most similar to that of Rozenfeld and Tennenholtz [53], who propose a 
continuous, consistent recommendation system. The model proposed in Section [3] can be 
viewed as a generalization of their recommendation system. However a few of their properties 
are not very pertinent to the case of collaborative editing, though we show that our specific 
function satisfies most properties like consistency and monotonicity. 

Network propagation models have received extensive consideration in the contexts of 
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epidemiology [IS], (mis)information cascading faults [7j, patches [HSj, etc. A majority 
of these models overlook the need for trust and concentrate more on the temporal aspects 
of information propagation than on information intensity and attenuation of information 
reliability as one moves away from the source. The generalized threshold model in [25] 
considers a threshold function for every user (v) which maps all subsets of f 's neighbor 
set to arbitrary values in [0, 1]. Our model can be viewed as an alternative realization of 
the same with fixed thresholds and continuous satisfaction scores. The problem of selecting 
the most influential nodes to market a product to is of great significance to economics 
and data mining and was first posed formally by Domingos and Richardson [15] sparking 
off a flurry of research. Subsequent work [TH| ES] has proven that most varieties of 
the optimization problem of selecting the best seeders is NP-Hard. An excellent survey 
of information propagation models and the "Maximizing influence" problem can be found 
in [B]. 

With respect to quality of collaborative content, the recently proposed Wiki Trust [5] uses 
author reputation and number of edits to measure the trustworthiness of each word in a wiki 
article, and detect vandalism. Automated evaluations based on global reputation and article 
semantics are however beyond the scope of this paper and we stick to using reviews from a 
small subset of the user base to calculate satisfaction scores for the rest. As far as we are 
aware, this is the first study applying the web of trust model for collaborative work. 

2 Model 
2.1 Trust 

Although the notion of trust in computation has been inspired heavily by real-life relation- 
ships, its definition is application-dependent. For instance, it is pertinent to define trust in 
recommendation systems based on 'similarity of tastes and preferences' as it makes sense in 
P2P applications to define it based on a user's bandwidth and upload/download ratio. For 
the purpose of collaborative editing, we derive inspiration from the definition proposed by 
Sztompka [38] : 

"Trust is a bet about the future contingent actions of others" , 

which is interpreted in our application as: Trust is the amount of faith a user has in the 
choices and actions of another user. This definition is consistent with Equation ([T]) for 
calculating satisfaction scores based on the rating of a single user (the same user who had 
edited the article). The model presented in this paper is based on classically accepted 
assumptions and notations: 

1. tij represents the trust user i has in the actions of user j. This can be viewed as a 
directed edge from i to j which represents the direction of trust {i trusts j and hence 
draws information about the article from j's assessment to make an estimation about 
his own satisfaction). 
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2. Vi, j, < tjj < 1, wherein a trust of indicates that either i has no knowledge of j or 
does not trust j at all. The absence of an edge from i to j indicates that i does not 
know (and hence cannot trust) j, i.e. he will ignore j's assessment in his satisfaction 
estimation. A trust score of 1 indicates that i has complete faith in j's actions. 

3. tij is not necessarily equal to tji. This models the asymmetry that exists in real- life 
relationships. 

We relax two key assumptions made in |[46| in our model 

1. Having a trust value between every pair of users imposes the restriction that every col- 
laborator must have some knowledge about the reliability of every other collaborator. 
This is however, impossible in large online communities where the total number of user 
pairs is of the order 0(A^^), where is the number of collaborators. In our model, 
users have trust values only for a subset of the total user base, i.e. other users about 
whom they have prior knowledge. 

2. Instead of the article being rated by only the previous editor, our model allows a small 
subset of the collaborators to read and rate the article, and if necessary edit it in which 
case the article has to be rated again by other interested users. 

Thus, one can view the system as a directed graph where an edge between a pair of users 
indicates existing trust and thereby some sort of prior interaction. Figure [T] shows the sample 
graph of relationships in a network with six users. The shaded circles represent the raters of 
the current version of the article. 



Given a large community of users, all interested in the outcome of a single doc- 
ument or entity, and a set of trust values for each user on other users, use the 
quality evaluation of the document provided by a few users (raters) to estimate 
the satisfaction of the others (non-raters). 

In the rest of the paper, we will denote the social (trust) network by the triplet G := 
{(y, E),T, B), where V,E are the vertices and edges respectively of a directed graph with 
1^1 = N, each node being a collaborator of the document and edges representing trust 




Figure 1: Sample Graph representing a community with 6 users and 2 raters 



The above assumptions directly yield our problem statement: 



Problem 
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between collaborator^ For the sake of consistency, we shall use & E to represent a 
directed edge from i to j, meaning that i trusts j (to some extent). T : E ^ [0, 1], referred 
to as the trust matrix, consists of the weights for each edge, i.e. trust between users. As 
mentioned previously, we shall use tij to denote the trust user i has on user j. Finally, 
B : V ^ [0,1] is a set of threshold values that denote the minimum quality expectation of 
each user regarding the article. 

We shall also use the notation Gr = {{V, E),T, B , Nr, R) to include the details of the 
raters in the situation description. By rater, we refer to a collaborator who has read the 
document and given it a quality estimation or rating on a scale of 1. Nr C V represents 
the set of raters. R : Nr — )■ [0, 1] represents the rating associated with each rater. Abusing 
notation, we shall henceforth use rj = R{i) to denote the rating of user i, if he is a rater and 
bi = B{i) to denote his quality threshold. 

Let N{i) represents all nodes that user i trusts, i.e. N{i) = G E}. Then we 

denote by NR{i), the neighbors of i who are raters and N^li), the non-raters, i.e. NR{i) = 
N{i) n Nr and Niy{i) = N{i) \ Nr. Finally, a user i is said to be satisfied if his satisfaction 
level is larger than his quality threshold, i.e. Sj > b^. 



3 Satisfaction Estimation Model 

Intuitively, the satisfaction level of a non-rater should depend not only on the ratings of the 
various raters that he is connected to, but also on the transitivity of trust values as ratings 
spread across the network. The main question that arises is: if a non-rater does not directly 
trust a certain rater, then how can he take into account that rating in his own satisfaction 
calculation? This question has received widespread attention in the literature [221 El] , as 
propagation of (mis)trust is a key issue in most trust-based networks. For instance, let us 
consider the graph of Figure [l] induced on only the three users [A, B, D]. 
From the figure, and following ([T]), A's 

satisfaction should be proportional to rstAB- d > a 

On the other hand, since D does not directly \ 
know B, he can only estimate his satisfaction W!k 
based on A's satisfaction, thus depending on 

A's belief about B. This can be interpreted i r m i i 

/, ^ J. \ u ^ u Figure 2: The network of Figure 1 mduced on 

as sd = [to A tABjTB, where may be x -n ^ ^ 

. , 1 . i J J- the three users A, B, D 

any appropriate binary operator depending ' ' 

on the application. In this work, we take 

= X , the binary multiplication operator in order to ensure that our system is conservative 
(to be defined later). Thus, sd = {iDAiAB)fB- Indeed, in this paper trust is used to factor 
a certain level of caution when basing one's decision on the recommendations of others. 
Overestimating one's satisfaction score because of false recommendations must in particular 
be avoided; hence the multiplication by the trust value in the above satisfaction calculation. 

^We use the terms user and collaborator interchangeably throughout this paper. 
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However, this problem becomes much more comphcated when there are multiple raters 
and multiple paths between a rater and non-rater. It is common practice to tackle this 
problem by estimating (indirect) trust between all pairs of users. However, in this work we 
avoid indirect trust estimation and instead, compute the satisfaction score of a non-rater as 
a weighted average of the (discounted by the trusts) scores of its trusted peers, using the 
following generalized model. 



(2) 



ji<aNR{i) j2eAf]v(j) 

where the weight functions / and g are generic. 

Based on various requirements (to be elucidated in the following section), we propose 
and analyze the following specific equation in this paper: 



if i is a non-rater 



« Y '^^ii + Y ^^^2 (3) 

if i is a rater, 

where a is a parameter which determines the relative importance of raters with respect to 
non-raters. In this model, we take 1/2 < a < 1, in order to provide greater weight to the 
opinions of a person who has directly read the article than someone whose opinions are based 
on hearsay. 

To simplify the expression, we write the weights used in ([s]) as follows: 

V.GiV,V,eiV(.), ^,.:=t?. ^.■eiv.} + (l-^)W 



Y ^^ii + (!-«) Y ^< 



where ^{x} = 1 if X is satisfied and otherwise. With this notation, (|3j) implies in particular 
that 

i^Nr Si= Y '^ij^r (5) 

Mathematically, the model can be viewed as follows. Suppose a node A has neighbors with 
satisfaction scores of {51,52,53,54} (each neighbor can be a rater or a non-rater). Then, in 
accordance with Equation ([T]) the node receives values {tAi5i, tA252, ^^353, ^^454} as possible 
satisfaction scores. Thus it is reasonable to take the actual satisfaction level of the node as a 
weighted mean of the incoming values from each of its neighbors. Here, we take the weight 
of the j*^ satisfaction recommendation {tAjSj) to be the trust value on the corresponding 



7 



neighbor (tAj). Weighted averaging with priority a for raters and (1 — a) for non-raters 
yields Equation ([s]) for satisfaction calculation. It is also apparent that when the underlying 
social network is a complete graph (i.e. E = {V x V) , which occurs when all collaborators 
have prior knowledge about each other), there is only one rater {\Npi^\ = 1) and a = 1 (zero 
weightage to satisfaction scores of non-raters), Equation ^ simplifies to Equation ([T]). Thus, 
the satisfaction estimation model proposed here is an extended version of the model in [IB] . 

The above formula yields a system of linear equations with N — \Nii\ unknowns, where 
the unknown variables are the satisfaction scores (sj). A necessary condition for a non-trivial 
solution to exist is that \Nii\ > 0, i.e. there must exist at least one collaborator who has read 
and rated the article. Drawing a parallel to network diffusion algorithms, one may attempt 
to calculate the spread of satisfaction scores across the networl^]^ It is not very clear as to 
whether such an algorithm would converge as the satisfaction score of a user is dependent 
on the satisfaction scores of his neighbors, which may or may not be unknowns themselves 
at any given stage. The worst case is clear from Figure [T| where the satisfaction score of 
A depends on F, which in return depends on A's score itself. Moreover, there is also some 
ambiguity over what the satisfaction of a non-rater should be when there is no path between 
him and a rater. These difficulties are tackled in the following section where a simple, yet 
computationally efficient iterative algorithm is presented to calculate satisfaction scores. 

We are now in a position to describe the complete collaborative editing process by means 
of a mechanism that uses the satisfaction estimation model defined above. The mechanism 
is managed by a central authority which selects the most appropriate user to read/edit the 
document at each stage and also maintains the trust values between users. It is assumed that 
all users are interested only in improving the quality of the article and are without malicious 
intent, though we shall later present a trust updation mechanism that provides some incentive 
for non-vandalistic behaviour. The following pseudocode details the collaborative editing 
process: 

Simply put, as long as there exists at least one collaborator who is unsatisfied with 
the document, the algorithm chooses a user to rate (and potentially edit if necessary) the 
document. The only case where a deadlock might arise in the process is when a rater gives 
the article a rating lower than his own threshold value. However, such a case is not likely to 
arise in practical situations as a collaborator who is unsatisfied with the article quality will 
edit it himself in order to improve upon the content he is dissatisfied with. We remark here 
that if a constant < r/ < 1 is defined to be the fraction of satisfied users required in order 
to publish the document, then the algorithm can be modified (at line|5]) to continue the loop 
if the fraction of satisfied users is less than rj or else end the editing process and publish the 
document. 

^Starting from the source(s), the ratings spread across the network, and the satisfaction of a node at a 
distance k from the source is calculated at the k*^ iteration. This technique is analogous to breadth-first 
search. 
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Algorithm 1 Collaborative Editing Scheme 



D -(r- Document 

Z^(0) ^ Initial State of Document 


Sj <r- Vj {Set all users to he unsatisfied} 

while 3j, Sj < bj do 

select user (i) to read and rate the document 
if D has been edited then 

D{n + 1) Updated Document 
n ^ n + 1 

Sj ^ Vj {Reset satisfaction values to zero} 

else 

Ti ^ Rating of Article 

Update Satisfaction Scores of all non-raters using Equation (|3| 
end if 
end while 



Properties satisfied by the model 

The model defined satisfies a few properties which make it desirable for collaborative pub- 
lishing. The definition of most of these properties are based on those listed in [35], but are 
modified to reflect the needs of collaborative document editing. 

1. Stability of Satisfaction: The satisfaction score of a user is altered iff a non-rater 
becomes a rater, or the article content changes. This stability of scores holds because 
trust values remain constant throughout the collaborative editing process. Later we 
propose a trust updation mechanism that maintains this property by updating trust 
scores only between raters. 

2. Bounded Satisfaction: < Sj < 1 Vi, irrespective of G. This is true because 
< tij < 1 and < Tj < 1 "^ijj and thus any linear combination of them should lie 
within the same limits. 

3. Conservativeness: For any node, its satisfaction score lies between the maximum 
and minimum satisfaction of its neighbors (multiplied by the trust in them). Math- 
ematically, Si^rnin < Sj < Si^rnax, where Si^rnin = ^i^Vj{tijSj\{i, j) G E} and Si^rnax = 

max\/j{tijSj\{i, j) e E}. As a corollary, the satisfaction score of any user is not greater 
than the maximum rating given by a rater. Conservativeness can be interpreted as, 
while a node receives several recommendations for its satisfaction score, it is extremely 
cautious when deciding upon the final satisfaction value. In order to avoid over- 
estimation of the document, the satisfaction scores of its neighbors are brought down 
by a factor equal to its trust in them. Thus the satisfaction score of a user can be 
viewed as a lower bound on his quality estimation of the document. 
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4. Consistency and Progressiveness: In the event that a non-rater becomes a rater 
and provides the article a rating not less than his last estimated satisfaction score, 
then the satisfaction of all users in the network either increase or remain the same. 
Formally, if Sj is the satisfaction of user j in the system Gji = ((V, E),T, B, Nji, R), and 
s'j is the satisfaction under G'^ = ((V, E),T, B, Nji Ui,RU rj) for any i, such that rj > 
Si then s'j > Sj. Progressiveness is the property that as long as the document is not 
edited, the satisfaction score of a user is strictly nondecreasing, assuming that the score 
always remains a lower bound for the actual rating of a user. This can be interpreted 
as when more users review a (good) article, there is a natural tendency to become less 
conservative and increase one's lower bound. 

5. Independence: 

(a) Prom Disconnected users: Removing a node to whom a particular user has no 
path whatsoever, and all its associated edges does not affect the user's satisfaction. 

(b) Between Rater and Rater: Removing edges between raters does not alter the 
satisfaction score of any user in the network, for that particular stage of document 
editing. 

6. Irrelevance of Order: The final satisfaction of a user does not depend on the order 
in which the raters were chosen. Thus the system depends only on the current input 
state Gn = iiV,E),T,B,Nn,R). 

3.1 Existence and uniqueness of the satisfaction scores 

We prove here that the system of linear equations expressed by ([s]) for non-raters admits a 
unique solution, under some mild assumptions that we now clarify. The first convention we 
take, still in the spirit of users being cautious while considering document quality, concerns 
non-raters who have no trust path (i.e. no path consisting of edges in E) connecting them 
to any rater. 

Assumption A If a user does not trust (even indirectly) any rater, then his satisfaction 
score is 0. 

We can then prove the following result. 

Proposition 3.1 For a given trust network Gr = {{V,E),T,B,Nji,R), the set of equa- 
tions defines a unique vector (sj)jgv' satisfying Assumption^ 

Proof As before, we denote the rating of a user i G by rj = R{i). 
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Existence. Consider the series of size-A^ vectors (s*^"))„gN defined for all i by: 

5(0) ^ f iite Nr 
* otherwise, 

ri Hie Nr, 



Vn > 0, 4"+') 



(n) 

iieAffl(i) i2GAfiv(i) 



(7) 



" + (1 - a) XI 



otherwise 



Note that we take here the same weights as in Q, so that the notation above becomes 

^^NR st"^'^ = X w.,sf. (8) 

j&N{i) 

It is easy to see that Vi G A^, < Wij < 1, the right side becoming an equality only 

when a node trusts all its neighbours with a trust value of 1. 

We now show by induction that for all nodes i, s["'^ is in the interval [0, 1], is nondecreasing 
in n, and equals if i has no trust path to any rater. 

• We immediately see from ^ and ([t]) that s^^^ and s*^^^ are in [0,1]^, and that s^^) > s°. 
We also easily remark that if i has no trust path to any rater, then sf^ = sf''^ = 0. 

• Now take no > 1, and assume that for every 1 < n < uq, s*^"^ G [0, 1]^, and s'-"-' > 

g{n-l)^ 

Each term of s^'^""'"^) is a weighted sum of the terms in s^"") with nonnegative weights 
whose sum is in [0, 1], therefore s*^"""'"^^ G [0, 1]^. 

Moreover, consider a node i with no trust path to any rater. Then sf^"^^^ is a weighted 
sum of the scores (s^"°^)jgAr^(j) of the nodes j that i trusts. But none of those nodes 
has a trust path to any rater (otherwise we would have a trust path from i to a rater), 
thus Sj"""^ = 0, which implies s^"""*"^^ = 0. 

Finally, for every non-rater node z, we have from ([8 



jdNli) 



where we used the induction hypothesis and the nonnegativity of the weights Wij. 
Since for each rater j, s^J^^^^^ = sj""'', we have 8*^""+^) > s'^"''^ 
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The coordinates of (s^"^)„gN are therefore nondecreasing and upper bounded by 1. Conse- 
quently, (s*^"^)„gN converges to a vector s e [0, l]'^, that satisfies Assumption [a} From ([t]) we 
see that s also satisfies ([s]). 

Uniqueness. Assume that there exist two distinct vectors s and s satisfying both the 
set of equations ([s]) and Assumption |A| Without loss of generality, we can then assume that 
there is a node i E V such that Sj > Sj. 

Define io := argmaxjSj — Sj. We can immediately say that node io must have a trust 
path to a rater (from Assumption [A|) , but is not a rater itself (from ([s])). Then, from ^ we 
should also have 

SlQ — Sig = ^ ] WijiSj — Sj) < (Sjg — Sig) ^ ^ Wjj < S " Sjg, 

j£N{io) j&N(io) 

where the first inequality comes from the definition of io and the positivity of {uiij)j£Niio)j 
and the last one from the fact that X^jeivCio) — ^" then obtain that these inequalities 
are actually equalities, and since the weights (wjj)jeAf(jo) strictly positive this implies 
that Y,jeNiio) '^ij = 1' 



j E N{io) Sj - 5 



In particular, this means that no node in A^(io) is a rater (otherwise we would have a 
contradiction with (|3|) as Sj — Sj would equal zero for the rater and ^jgjV]v(jo) ^ -'-)• 

But we can apply the exact same reasoning that we used for io to all nodes in N(io), 
and recursively show that io does not trust (even indirectly) any rater, a contradiction. 
Consequently there can be only one solution to ^ satisfying Assumption |Aj 



3.2 Algorithm to calculate satisfaction 

In the previous section, we observed that the satisfaction scores of non-raters can be obtained 
as a solution of a system of iV — |iVij| linear equations. We argue why this approach is not 
computationally efficient when the number of users in the community is large, and present a 
more efficient iterative algorithm that generates solutions up to desired degrees of accuracy. 

Commonly used computer packages and toolboxes solve systems of linear equations by 
some variant of Gaussian Elimination [29', Chapter 2]. The standard Gaussian Elimination 
uses a series of row operations to convert a given system Ax = b to the form Ux = y 
{LU decomposition) where U is an upper triangular matrix, and then solves the latter by 
backward substitution. It is well known that while Gaussian Elimination requires 0{N^) 
mathematical operations, its actual time complexity is much worse even with the most 
efficient of implementations [18j. 

Most of today's social networks boast of large registered user bases, and this number 
is only expected to grow with the proliferation of web-based technologies and high speed 
Internet. Hence, the social graph can contain anywhere between 1000 and million nodes, 
and it is not feasible to execute even a 0{N^) algorithm, let alone complexities worse than 
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cubic which is normally the case. Moreover, the memory space required is also of the order 
of 0{N'^) or greater. This implies that any mechanism which has to repeatedly calculate 
satisfaction scores every time a non-rater becomes a rater requires a computationally efficient 
algorithm that uses the structure of the graph and existing satisfaction scores to estimate 
new satisfaction values. Such an algorithm, which effectively exploits the sparse nature of 
these graphs is proposed here. 

In order to maintain consistency of notation we use (s'^"))„gN to represent the column 
vector of all satisfaction scores at the n*'* iteration of the satisfaction computation algorithm 
and sf"^ for the satisfaction score of the i''^ user after the 'n}^ iteration. The algorithm to be 
discussed uses the following N x N matrix A: 

, I 1 if i = 7 . . 

Aij = < „ .r ■ , ■ t IS a rater 



if^^j 



. Wij Hi ^ j . . 
Aij = i .r ■ . t IS a non-rater 



iii = j 



Based on the above matrix, we define our iterative algorithm as: 



Algorithm 2 Iterative Satisfaction Algorithm 
1: T ^ Maximum Tolerance/Error 
2: M Maximum Number of Iterations 
3: n ^ 

{eo represents a column vector of all zeros} 
5: while 3j, (sj"^ - sj""^^) > T and n < M do 
6: n ^ n + 1 
7: s(") = A.s^*"-!) 
8: end while 



The algorithm uses matrix-vector multiplications in order to arrive upon the desired vector 
of satisfaction values, which (approximately) satisfy Equation ([s]). Normal implementations 
use O(iV^) operations for each matrix- vector product and thus, the overall efficiency of the 
algorithm is 0{MN'^). As long as M < A^, Algorithm ^ provides a distinct advantage 
over directly solving the linear equations. However, in general, the underlying social graph 
of the community is very sparse as each user trusts only a limited number of other users, 
especially when the total number of users who have access to the document is very large, i.e. 
Vi, |A(i)| ^ N. In such cases, there are very efficient representations for sparse matrices like 
Compressed Row Storage (CRS) |.9| which use space only in the order of 0(n„^ -|- N), where 
Unz is the total number of non-zero entries in the trust matrix or in other words, the number 
of edges in the social graph(|i?|). CRS requires only 0(n„^) operations for matrix- vector 
multiplications, which is a significant increase in efficiency over O(A^) as long as the matrix 
is sparse. 

It is easy to s ee th at the convergence of Algorithm [2] follows from the Existence proof 
given in Section (3.1). The (s'^"^)„gN vectors obtained after every iteration of the above 



13 



algorithm are the same as the vectors defined in Equations ^ and (7]). We conclude that 
converges to a unique vector s G [0, 1]^ that satisfies Equation ([3). 

4 Analytical Results in Random Graphs 

The Erdos-Renyi model of Random Graphs is an interesting, yet simple mathematical tool 
that allows us to analyze social and technological networks. In this model, every possible 
edge in a graph has a certain probability of being present, and is independent of every 
other edge. We denote by G{N,p), a directed random graph with N nodes where p is the 
probability that any given edge is present. By straightforward reasoning, the expected 
number of edges in the graph is pN{N — 1). Since each edge can be viewed as a Bernoulli 
random variable with probability of success p, the degree of a node follows the Binomial 
distribution. If we define a parameter A, such that p = ^/n, then the expected degree of 
a node comes out to be ~ A. Keeping the expected degree constant as — )■ oo, the node 
degree D becomes a Poisson random variable with parameter A. 

P{D = d) = e-'^ (9) 

The behavior of random graphs as A varies has been well established: when A < 1, the 
connected components are small and no larger than 0(log N); when A > 1, a giant connected 
component of size 0{N) emerges. For a thorough treatment of random graphs, the reader 
is advised to go through the text by Bela BoUobas [12,]. 

In this section, we focus on the choice of the set of raters. Our goal is to compare the 
performance (in terms of proportions of satisfied users) of different rater sets, and possibly 
minimize the total effort needed to validate a document. In order to reach analytical results 
in that direction, we consider here a specific type of instance, satisfying the assumptions 
below. 

Assumption B The trust relationship graph {V, E) is an Erdos-Renyi graph with parameter 
X, and the number of nodes tending to infinity. 

Assumption C The trust values and the rater behavior are such that: 

• for any edge G E, the trust value is the same and is denoted by tij = t, 

• all raters evaluate the document with the same rating r > 0. 

Considering trust values tij (resp. rater evaluations rj) to be the same for all pairs of 
connected nodes (resp., raters) is a very strong assumption. We however make that choice in 
order to highlight the infiuence of the structure of the social network, that is the topology of 
the graph (V, E), on the resulting satisfaction values of users. In the end, we wish to exploit 
the topology to optimize the selection of the raters (e.g., to minimize the total evaluation 
effort over the network) . Having heterogeneous trust values would bring some added richness 
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to the model, which should be taken into account in the rater selection decision, an interesting 
extension of the scheme (greedy algorithm) we develop here. On the other hand, relaxing 
the assumption of all raters giving the same score seems less interesting, since in practice 
the score set by a rater cannot be predicted (and thus, considered as an input for the rater 
selection problem): due to the linearity of the satisfaction calculation model, we expect 
that assuming the scores to be randomly distributed with mean value r would yield similar 
results. 

4.1 Distribution of satisfaction scores with uniformly selected raters 

Suppose that Assumptions [B| and [C| hold. We are interested in calculating the (cumulative) 
distribution F of satisfaction scores among non-rating nodes, given that a fraction k of nodes 
are raters, picked uniformly among the users. In other words, when we pick a non-rater, 
F{x) is the probability that his satisfaction score is below or equal to x, for x G [0, 1]. By 
definition, -F(O) = c, where c is the fraction of users in the connected components with no 
raters, and F[l) = 1. The symmetry inherent in random graphs and the fact that every node 
has an equal probability of being a rater {k = \^r\/n) ensures that the satisfaction scores of 
non-raters can be treated as independent and identically distributed random variables. 

We start progressively, considering d the number of nodes that a non-rater trusts, and 
computing Fd{x), the probability that the satisfaction score of the node is smaller than x 
given that the node has d trustees: 

• If d = 0, then the satisfaction score of the node is 0, and Fo{x) = l,Vx. 

• If d = 1, then the satisfaction score is the one of the neighbor, multiplied by the trust 
value t. That neighbor is: 

— a rater with probability k (thus, with satisfaction score r under Assumption [C|). 
Then the probability of the node satisfaction being below x is l{tr<x}, where I^a} 
equals 1 if condition A is satisfied and otherwise. 

— a non-rater with probability 1 — k (thus, with a satisfaction score distributed 
according to F). Then the satisfaction score of our node is below x if and only 
if the satisfaction score of the trustee is below x/t, which occurs with probability 
F(min(x/t, 1)). 

As a result, if d = 1 the expected probability that the satisfaction score of the node is 
below X is 

Fi(x) = kl{tr<x} + (1 - k)F{mm{x/t, 1)). 

• If (i > 2, we use conditional probabilities as in the previous case, varying the number 
£ of neighbors who are raters: the probability that there are i raters among the d 
neighbors is Cfk^{\ — kY~^, with Cf = wfif^i- If we denote by / the (unknown) 
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probability density function of F, then we have: 



d 



Fa{x) = J2 Cf^^i^ - Si < fix- cr) 

£=0 non-rater neighbors i 



where jj, = j{c + {d — tj} and c = 



Finally, the fixed point equation that the distribution F (or the density /) of satisfaction 
scores among non-raters should satisfy is the following: 



Va;G [0,1], 

oo 



F{x) = e-'J2^F,{x), (11) 



d=0 



where Fd{x) is given in (10). 



4.2 Bound on the number of raters needed to validate a document 

In order to prove that the model defined in Section [3] results in improved efficiency, we 
need to show bounds on the number of raters required to satisfy a sufficient proportion of 
the collaborators. In other words, we argue that the Satisfaction Estimation Model leads 
to reduced human effort with respect to a situation where each user should evaluate the 
document separately to validate it. We show here an upper bound on the maximum number 
of unsatisfied users in the system, depending on the proportion k of raters. The result is 
derived by using the reverse Markov inequality on the expected satisfaction value of users. 
In Section [6} we show the actual number of raters required to pass the document, obtained 
via extensive simulations in various network conditions. 

We first claim that for any instance of G/j = ((V, E),T, B, N^i, R) , "ii & V and any value 
a > 0.5, we have: 

Si,a > Si^(^a=0.5)y (12) 

where Si^a represents the satisfaction score of user i that is obtained when users apply a 
weight a to the scores of raters in their satisfaction computation ([s]). 

This means that the satisfaction score of a non-rater is always greater when raters are 
given more weightage than when equal weight is given to the satisfaction of raters and non- 
raters. This can be proved inductively by using an approach similar to the proof of existence 



given in Section 3.1 and using as the induction hypothesis: 



air + (1 - a) E,tJ ^ «£r + (1 - a) E 



+ {l-a){d-i) - ai + {1 - a){d - i) 
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in the case when a = 0.5, and d,i,r are the number of users that the non-rater trusts, the 
number of raters among those, and the (common) score set by raters respectively, as defined 
previously. We now derive the expected satisfaction score of a non-rater as a function of 
the proportion k of raters, when a = 0.5. The expected satisfaction score when a > 0.5 is 
always greater than this value and hence the bounds obtained hold for all cases. 

Proposition 4.1 Consider Assu'mptions\^ [5| an(i[^ hold, and that a proportion k of raters 
is randomly selected among the set of users, according to a uniform law. Then if users weigh 
non-raters' opinions as much as raters' scores (i.e. a = 0.5), the expected satisfaction score 
among non-raters is 

trk{l-e-^) ^^^^ 



;i-t(l-fc)(l-e-A))' 



Proof Denoting by D and L the random variables giving the number of nodes trusted by i 
(degree of node i) and the number of raters among those trustees, respectively, we have 



Af-1 



nsi 



^E(si|L) = d)'¥{D = d) 

d=0 
N-l 

J2Hsi\D = d)F{D = d) 
d=i 

Af-1 r d 

^l^¥.{si\L = = d)'¥{L = ^\d) W{D = d) 



d=l k (=0 



where P(A|i?) refers to the probability of A conditional on B, and the second line comes 
from Assumption |Xj 

Now, for 0? > 1 we have: 



E(s,|L = i,D = d) = -E 

(Jj 



£r+ J2 

jeNN{i) 



By linearity of expectation, and using the notation s := E(sj), we have 
t 



E(s,|£,d) 



E(si|rf) 



d 



{£r + {d-i).s) 



J2 Cfk^l - kY-\(ir + {d- i)s) 



1=0 



{rdk + sd{l - k)) (using that lCfk\l - kf-^ = E[L\D = d] = dk) 



t{rk + s{l -k)) 
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Interestingly, this is independent of d (when d>l). Two cases therefore arise, 
Case I: ci = 

E(si) = (by Assumption |A]) 

P(D = 0) = e-^ 



Case II: > 1 



E(si) = t{rk + s{l - k)) 
yd 

P(D = d) = e-^ — 



Hence, we have 



E(si) = t^e-^ — {rk + s{l-k)) 

d=l 

s = t{l - e~^){rk + s{l - k)) 
=^ s{l-t{l- k){l- e-^)} = trk{l-e-^), 

which estabhshes the proposition. 

We can now use this result to provide bounds on the number of raters that are needed 
so that a target proportion of the community is satisfied with the document. 

Proposition 4.2 Consider Assu'mptions\^ [5| an(i[^ hold, and assume that the minimum 
quality requirements of users are all equal to some value b G (0, 1) such that r > b. Then if 
raters are picked uniformly among the set of users, for a proportion T of the non-raters to 
be satisfied it is necessary that the proportion of raters be at least 

^ ' t(l-e-^)(r-6T) ^ ' 

while it is sufficient to choose a proportion of raters 

^ ' ' t(l-e-^)(r-6-T(l-6)) ' ' ^ ' 

Consequently, to satisfy a proportion T of the whole community: 

• it is necessary that the proportion of raters be at least fc™''^(T™™), with T™"^ the solution 
of the equation (1 - fc™"(T))(l -T) = l-f, 

• it is sufficient that the proportion of raters he at least /jmax^^^max-j^ with T^^^ the solution 
of the equation (1 - fc'""^(T))(l - T) = 1 - T. 
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Proof We can apply Markov's inequality to obtain a lower bound on the number of unsat- 
isfied users. 

P(s^ <b)> 



Substituting (13) in this equation, we get 

b-t{l -e-^){rk + b{l - k)) 



P(s, < 6) > 



6(1 -t(l - A;)(l -e-^)) 



which leads to (14) once we write the target condition P(si <b) <1 — T. 
Again applying the Markov inequality, we obtain 



{S^ < b) 



'l-Si>l-b)< 



1 - E(si 
1-6 



Then substituting (13) in the above equation, we get 

1 -t(l -e-^)(l - A;(l -r)) 



[si < 6) < 



(l-6)(l-t(l-A;)(l-e-^))' 



(16) 



(17) 



which yields (15). 

Now, since r > 6 all raters are satisfied, so a proportion T of non-raters being satisfied 
corresponds to a proportion A; + (1 — A;)T of satisfied users (or equivalently, a proportion 
(1 — A;)(l — T) of unsatisfied users) over the whole community. This implies the second part 
of the proposition. 



5 Selecting raters to maximize satisfaction 

In this section, we consider an optimization problem that is closely connected to our model, 
that of finding the optimal set of raters (given a maximum number of raters) in order to 
maximize satisfaction. We show that this problem is NP-Hard irrespective of the chosen 
parameters. An approximation algorithm based on the concept of marginal cost and ap- 
proximate oracles is then proposed and the improvement in efficiency obtained due to this 
algorithm is shown via simulations in the next section. 

5.1 Complexity of Optimization Problem 

We define the optimization problem of selecting the best possible raters as follows: 
MAXIMUM-SATISFACTION Given: 

• a social network represented by Ga '■= {(y,E),T,B,r,a), where r is the rating that 
any rater would give to the article under consideration and all other parameters are as 
defined previously. 
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• and an integer k that denotes the maximum cardinahty of the rater set {\Nfi\ < k), 

find the set Nji C V of maximum size k, that satisfies the maximum number of users in the 
network according to Equation ^ and Assumption |Aj 

We now prove that this rater selection problem is computationally hard. 

Proposition 5.1 MAXIMUM-SATISFACTION is NP-Hard (even under the simpli- 
fying assumptions 

Proof The problem that we use for our reduction is the NP-Hard problem Maximum k- 
Cover, also known as the Maximum Coverage Problem. The problem is stated as follows: 
Given universe U = {Ui,U2,--- ,Um} and a collection of subsets S = {Si,S2,--- , Sn] 
such that Si C U,Wi, find a collection of subsets S* C S which maximizes | Uj 5**1 and 
satisfies 15*1 < k. In other words, we have to cover as many elements as possible in the 
universe. We assume that (UjS'i) = U. We now reduce this to an instance of MAXIMUM- 
SATISFACTION. 

Consider a bipartite graph Ga = ((V, E), B, T, r, a) such that |y| = m + n. The m + n 
nodes are {Ui, U2, • • • , Um) and {Si, S'2, ■ ■ ■ , Sn)- There is a directed edge Uj — )■ Si iff Uj G S^. 
Let tij = t for all edges. All of t, a and r lie in the range (0, 1]. Let 6j = Vi G The 
purpose of setting all the thresholds to zero is to ensure that a user would be satisfied as 
soon as his satisfaction score becomes non-zero, as Si > hi is the condition for satisfaction. 

Claim Any algorithm for Maximum Satisfaction of (Gq,, k) also yields a solution for 
Maximum k-cover. 

Lemma 5.2 Any algorithm that selects raters to maximize satisfaction only selects nodes 
from S. 

Suppose an optimal solution contains the node Uj G f/, we can replace Uj hj Si E S such that 
{Uj, Si) G E and the solution still remains optimal. Such an Si exists because {UkSk) = U. 
In the case when Si already belongs to the set S*, then the solution is improved by one 
which contradicts our claim that the previous solution is optimal. Therefore, there exists an 
optimal solution with nodes only from S, which the algorithm finds. 

Vf/j, > iff the optimal solution contains a set 5*^, which has an incoming edge from 
Uj, i.e. Uj trusts Si. Thus any algorithm for Maximum Satisfaction, selects k nodes from S 
such that the number of satisfied nodes in U is maximized. This completes our proof that 
the MAXIMUM-SATISFACTION problem is NP-Hard. 

5.2 Greedy Algorithm to Select Raters 

In the preceding sections, we proposed a model for estimating satisfaction scores of users in a 
collaborative network given the ratings of users who have already read the document. Based 
on this model, we described the complete collaborative editing process. However, we have 
so far avoided specifying any particular method to select raters apart from randomization. 



20 



Although we proved that choosing the best possible users to review a document is NP-Hard, 
it seems likely that selecting raters based on the trust network (in polynomial time), may be 
more efficient than a uniform random choice. Here, we propose an algorithm that greedily 
selects raters who are likely to satisfy the maximum number of non-raters at each stage of the 
document development. In the following section, we show that this algorithm outperforms 
the other greedy algorithms based on trust, and also the random rater selection strategy. 

Quite a few greedy algorithms present themselves as viable solutions once the social 
network structure {Gr — {{V, E),T, B, Nr, R)) is obtained and the weight a is fixed. On the 
surface, it seems like a good idea to select the non-rater on whom other non-raters place the 
maximum amount of trust, to read and rate the article, i.e. we choose i such that i ^ Nr and 
'^UNb maximum over all i. However, it appears that this algorithm is short-sighted in 
the sense that it only takes into account the satisfaction of the immediate neighbors without 
considering the impact of the rater as the rating propagates across the network. We later 
show that this is indeed the case. Therefore, we aim for a greedy algorithm that selects the 
user who has the maximum impact on all users across the network. 

Before presenting our algorithm, it is useful to dwell upon the concept of Marginal Cost, 
which we borrow from economics. Marginal cost (MC) is a concept that is used to indicate 
the change in total cost that arises when the quantity produced changes by one unit. That 
is, it is the cost of producing one extra unit of a good. If the total cost function TC is 
differentiable, this means that 

where Q is the independent variable denoting the quantity being produced. Typically, TC 
may be a linear or nonlinear function of Q. It is not uncommon in real problems for the 
variable Q to take only discrete values. In such a case the marginal cost is redefined in a 
macroscopic sense, that is. 

Let U : Gr — > N be a function that maps any particular state of a social network to the 
number of satisfied users in that network, i.e. U{G) — \S\, where S is the set of users in the 

system whose satisfaction scores are greater than their thresholds. Then, we define a quantity 
Marginal Satisfaction (MS) associated with each non-rater to be the number of non-raters 
newly satisfied when that particular user is chosen to rate the document. Mathematically, 

MSi = f{GR, i) = U{Gr + {i}) - U{Gr). (18) 

It is important to note the abuse of notation: all operations on Gr are performed only on 
the set of raters i.e. Gr + {i} denotes the same social network as Gr except that user i 
is now a rater. We assume that all raters provide the document a rating of r, or that we 
have the same a priori information about the scores that raters will provide, so that r is the 
expected value of those scores. The value of r can be increased at each successive stage of 
document editing to model the increasing quality of the document. Our greedy algorithm 
works as follows: 
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Algorithm 3 Greedy Algorithm based on Marginal Satisfaction to Select Raters 
1: while 3j, Sj < bj do 
2: Vi ^ Aij, calculate /(G'j^,2) 
3: select user i* such that f{GR,i*) > f{GR,i) Wi 
4: Nr-^NrU i* 

5: update Sj Vj using Equation ^ 
6: end while 



Basically at each stage, the algorithm chooses the non-rater with the maximum marginal 
satisfaction. We remark that this user may already be satisfied. The marginal satisfaction 
for each user can be approximated in polynomial time by running Algorithm [2] for a fixed 
number of iterations. Thus, the iterative algorithm proposed serves as an approximate oracle. 
The only disadvantage of this approach when compared to the random selection of raters or 
other greedy algorithms is the added complexity in calculating Marginal Satisfaction for all 
users at every stage. 

However, in the case when a = 0.5, we can exploit the linearity of our model and the 
fact that the weights in Q remain constant as we increase the number of raters, and use a 
dynamic programming based approach to calculate the marginal satisfaction of each user. For 
this, we define a quantity AMn{i,j) to be the increase in the satisfaction of non-rater j, when 
i (also a non-rater) becomes a rater and increases his satisfaction by 1, assuming all other 
parameters remain fixed. Then, our algorithm is motivated by the following observation: if 
user k becomes a rater, then all trust paths between i and j that pass through k cannot be 
considered anymore while calculating AAr^(i,j). Therefore we have, 

A(7VflUfc)(«, j) = ^NniiJ) - ANji{i,k) X Ajv«(A;,j). 

In other words, due to the linearity of satisfaction scores, the impact that user i has on j 
considering only paths that pass through k equals the impact of i on times the impact of k 
on j. Using this quantity, we can calculate the vector containing the increase in satisfaction 
values of all non-raters when user i becomes a rater as (r — Sj) x Aj\f^{i), where r is the 
rating as defined previously, Sj denotes the current satisfaction of i and Aj^j^{i) is a vector 
consisting of {ANj^ii, j))j^Nii- Adding the above vector to the vector of satisfaction scores, 
one can easily calculate the marginal satisfaction of user i. Therefore, at each stage one has 
to only maintain AAr^(i, j) for all i,j belonging to the set of non-raters, in order to calculate 
the marginal satisfaction of all users when a = 0.5. This can be done in polynomial time 
and is much more efficient than running Algorithm ^ for all users at each stage. In the 
following section, we use this method to compare the performance of our greedy algorithm 
to other algorithms. 
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6 Simulation and Results 



In Section |4} we gave a fixed point equation to solve for tlie distribution of satisfaction scores 
and showed bounds on the number of users required to review an article in random graphs 
to satisfy a given proportion of users. In this section, we are interested in examining the 
performance of our proposed model and greedy algorithm in practice, namely collaborative 
editing systems with large number of users. We study the effect of several network parameters 
including the number of raters and the density of the underlying graph on user satisfaction. 
We first show that our satisfaction estimation model results in a considerable efficiency gain 
with respect to the case when all collaborators have to peruse the document, even when 
the raters are selected uniformly at random. We then show that our greedy algorithm 
outperforms this random selection of raters and other simple greedy algorithms, especially 
in the later stages of document development. The main metric used to quantify efficiency 
is human effort, which we assume to be proportional to the number of raters. In other 
words, the lesser the number of users who have reviewed the document, the more efficient 
the system. 

6.1 Network Parameters 

The model that we simulate has many parameters, namely the number of users, the 
structure of the underlying collaborative network, the distribution of trust and the threshold 
values of users denoted by the vector B. We now describe the parameters chosen for our 
experiments. 

• Network We consider directed random or Erdos-Renyi graphs with = 10000 users 
and edge probability p (i.e. the probability that each of the iV^ edges exist). We 
express p as ^/n, where D is the average number of outgoing links per user. 

• Trust Each edge € E has a trust value tij G (0, 1) associated with it. The trust 
values are chosen uniformly from the same interval. 

• Threshold All users have a uniform constant threshold value (6), except in the final 
simulation where the threshold values are chosen from a truncated normal distribution 
with effective mean 0.25 and variance 0.144 respectively. 

• Raters In our reference scenario, the raters are chosen randomly with a probability k. 
The expected number of raters is therefore, kN. In the final simulation, we compare 
the performance of an algorithm which selects raters randomly to one which selects the 
user with the highest incoming trust at each stage to the algorithm based on marginal 
satisfaction (our Algorithm [s]) . 

• Ratings We assume all that raters give the article the same rating r, that without 
loss of generality equals 1. 
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• Rater Priority We take the rater weightage factor a to be 0.5 in all our simulations. 
As we showed in Equation (12), the performance of the system when a = 0.5 acts as 
a lower bound to its actual performance for larger values of the parameter. 



• Document Content A key assumption that we make is that the document content 
does not change throughout this process, i.e. each user only rates the article and does 
not modify it. This allows us to examine each stage of document development critically. 

6.2 Results 

All the experiments were performed in systems which follow the core model proposed in 
Equation ^ and the Collaborative Editing scheme described in Algorithm [l} In each case, 
the simulation was repeated a number times and mean values were plotted in order to obtain 
smooth curves and average out arbitrary variations. 

Effect of the number of raters 




Figure 3: Fraction of unsatisfied users with increasing number of raters for three different 
threshold values 



We first investigate the effect of the rater density (or alternatively the total number of 
raters) on the satisfaction values of non-raters in Figure [3] We analyze this separately for 
three different threshold values b = 0.2, b = 0.3 and b = 0.4 and for edge density p = ^^/loooo, 
which results in a dense and well-connected network. As expected, increasing the number 
of raters decreases the total dissatisfaction in the network. Suppose that we denote by kmin, 
the minimum fraction of users required to rate the article so that only 1% of the community 
remains dissatisfied, which indicates a near linear dependence of kmin on the threshold b. In 
other words, increasing the threshold, shifts the curve almost uniformly to the right. The 
fact that only 0.2 fraction of the total users have to review an article in order to satisfy the 
majority of the community points to the efficiency gained due to our model, even when the 
graph structure is not exploited and raters are selected randomly. 
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Effect of edge density 
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Figure 4: Impact of Edge density on user satisfaction for different rater densities 



We vary the edge probability p from to 0.1 and plot the fraction of unsatisfied users 
for different values of rater densities. The value p = 0.1 for = 10000 indicates that each 
user has, on an average 1000 outgoing edges. Beyond this value, the network becomes highly 
dense and unrealistic. Hence we study the effect of edge probability only up to this value. 
The satisfaction threshold was uniformly chosen to be 6 = 0.2 for all simulations. For lower 
values of rater density, the number of unsatisfied users in the network sharply decreases 
as the edge probability increases initially and then settles at a larger value upon further 
increase. This points to the fact that beyond a point, increasing the number of edges does 
not improve user satisfaction and has a contrary effect as long as the number of raters is 
sufficiently low. k = 0.1 acts as a critical point after which the average satisfaction in the 
network gradually increases upon increasing the number of raters. Around the kmin value 
for the network {k = 0.16), increasing the edge probability after a certain point has little 
or no effect on user satisfaction as most users remain satisfied. This phenomenon can be 
explained as follows: when the proportion of raters is low, those raters have a smaller impact 
in a dense network. Indeed, the nodes that trust a rater also trust several non-raters (and 
potentially, nodes with no trust path to any rater), which decreases their satisfaction. On 
the other hand, in a sparse network the raters are trusted only by a few users, but these 
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users are not influenced much by several nodes, and thus are more easily convinced. When 
the proportion of raters is sufficiently large, the results are in the intuitive direction: a dense 
network increases the likeliness of trusting a rater, and thus of being satisfied. 

Performance of greedy algorithms 




?? 



Figure 5: Comparison of three different algorithms to choose raters for two different values 
of edge probabihty 



Finally, we compare the performance of the greedy algorithm based on marginal satisfac- 
tion proposed in Section 5.2 to other algorithms for two values of edge probability, p = ^^/loooo 
and p = 50/10000. The trust-based greedy algorithm chooses, at each stage, the user whose 
sum of incoming trust from other users is maximum, i.e. it selects a user whom other users 
trust the most (Select the non-rater i such that tji is maximum over all i). The random 

raters algorithm selects raters uniformly at random from the network. Although initially 
both the greedy algorithms perform similarly, the trust-based algorithm suffers from slow 
finish and takes almost twice as many raters to satisfy the entire network as opposed to the 
marginal satisfaction algorithm. This shows that initially the users on whom others have high 
trust are good choices for raters. However, the underlying dynamics are much more complex 
as the number of raters increases and the individual effect of each rater depends not only on 
his trustworthiness but also on the threshold of his neighbors. Our simulations show that 
our greedy algorithm is more effective for sparse graphs than dense graphs, as increasing the 
edge probability reduces the difference between the two greedy algorithms. It is also evident 
that the performance of the random algorithm is quite poor as opposed to those which select 
raters based on the graph structure and user trust. 
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7 Conclusions and Future Work 



In this paper, we initiated the study of collaborative document editing systems based on 
the 'web of trust' concept and examined the performance of such systems in large random 
graphs. We believe that this opens up several avenues for future work given the similarity 
between collaborative systems and recommendation systems. 

Several of our results were based on the strong assumption that the document content 
does not change during review. One interesting direction of future research could be to 
study the dynamics of document editing and the conditions for convergence. This could 
be done using a model similar to the one proposed by Acemoglu et al who consider a 
network where each user holds a belief and users meet according to a Poisson process and 
exchange their beliefs. A slight modification of this model, with the addition of a common 
sink representing the document itself and users editing it according to the same Poisson 
process could be used to study the temporal effects of collaborative editing. Such a model 
could also be used to update a user's reputation depending on the fraction of the total 
document that is contributed by him, as proposed in pi] . 

Closely linked to the above paradigm is the game-theoretic analysis of collaborative edit- 
ing. The whole process can be viewed as a repeated game between several collaborators 
whose strategies correspond to content to add or modify in the document. Alternatively, 
one could also design incentives or mechanisms to induce editors to reveal their true evalu- 
ation of a document. For instance, users could be required to submit their evaluations (say 
Tj) for the document under consideration and their utility could be a function of {\ravg — ^il), 
where Vavg is the average of the evaluations of all users. Such games would however require 
efficient modeling of user preferences and document content and a consideration of which of 
the equilibria are actually reachable in practice. 

We quantified trust between users as a value in the range [0, 1], thereby avoiding negative 
trust or distrust between users. Propagation of negative trust not only adds to the complexity 
of the model but also leads to certain inconsistencies as notions that are valid for trust may 
not be so realistic in the case of distrust. If user i distrusts user j, who in turn distrusts k, 
a simple propagation model would indeed result in user i trusting k. However, the notion 
'the enemy of my enemy is my friend' may not always be valid in real life. Guha et al [22] 
try tackling some of these issues via two separate matrices capturing trust and distrust 
respectively. One possible direction of future research could involve checking whether our 
results are valid when negative trust is introduced. We have also deliberately avoided the 
effect of malicious users in our system and focused more on the dynamics of recognizing 
vandals. It would be interesting to study convergence in a system where a small fraction of 
users have malicious intent. 

We showed the efficiency of our system in Erdos-Renyi graphs with conservative values of 
the edge probability parameter p. Whether this efficiency is improved or worsened in more 
realistic models like Small World \^T] or the scientific co-authorship graph [30] warrants 
further experiments. 
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A Trust Updation Model 

In the previous sections, we defined the idea of trust for collaborative-working contexts 
and proposed a satisfaction estimation model based on this. The model and associated 
algorithms however, assume the existence of trust between various pairs of users and do 
not dwell upon how to determine the trust between users. We now propose a generic trust 
updation mechanism that calculates the new trust value, given the old trust between a pair 
of users, and their current ratings. Our model is motivated by the following observations: 

1. Content in most collaborative systems cannot be purely objective as it is provided by 
human editors. Therefore, the trust between editors is also subjective and depends on 
their individual opinions. A user may greatly mistrust another user whose opinions do 
not concur with his own. We conclude that the users with similar outlook and content 
should have high mutual trust and vice-versa. This is in congruence with the definition 
of trust that we used in Section 12.11 that trust is the amount of faith a user has in the 
actions of others. 

2. We mentioned earlier that the social network structure in the system may be explicitly 
provided or derived implicitly. Indeed, the underlying network of most collaborative 
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systems cannot be directly expressed in terms of friends, contacts and followers and has 
to be calculated using known information. We aim to preserve this property of web- 
based collaborative systems and estimate trust between users by using their previous 
interactions with respect to a common article of reference. When no such standard 
exists, we assume that the trust between the users is zero. 

3. Updating trust based on a user's overall contribution to the published document while 
objective, is more suited for reputation systems. For instance, in |3T], the overall 
trust value of a user (say i) is calculated as the fraction of the total number of words 
in the article, written by that particular user. Such systems depend somewhat on 
semantics, and hence can be cheated to a certain extent. In the above model, a user 
may deliberately rewrite simple sentences in a complex manner in order to increase his 
trust / reputation. 

We use the following key idea for updating trust: a user i's trust tij in another user 
j should increase if the rating of user i is similar to that of user j at a particular stage 
of document development. The farther the rating of user i is from j's rating, the more j 
will reduce its trust in i. Stating this mathematically, the trust value should be inversely 
dependent on |rj — rj\. Hence, the general updation formula should be as follows: 

4. = 7t., + (l-7)/(|r.-r,|), (19) 

where is the new trust value of user i in user j and tij the old value, 7 is a parameter 
which determines the rate of change of trust and / is a monotonically decreasing function. 
Using this model, the updation mechanism could be described as follows: 

Everytime a user (i) reads, and rates the article, update the trust values {tij and 
tji) for all j G Nji using Equation (19). 



This mechanism results in fairness as when a user with few or no edges rates the article, 
trust values between this user and other existing raters are automatically calculated without 
any direct interaction. Also, by Independence (between rater and rater) property, this 
trust updation model does not violate any of the other properties elucidated in Section |3j 
Now before choosing a specific function for updating trust, we list the requirements that any 
model must follow: 

1. The function must be continuous and monotonically decreasing. 

2. < \ri — rj\ < 1. Thus our function / must satisfy the boundary conditions, /(O) = 1 
and /(I) = as tjj also lies in the same range. 

Formula 

The simplest and most obvious function which satisfies our constraints is the linear function 
f{x) = 1 — X. Here we do not bother ourselves with the function behavior outside the 
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domain [0, 1]. The following graph displays the function in the required domain. The overall 
updation formula becomes 

t'ij =ltij + {I - x), 

where x = \ri — rj\. The problem with this function is that slope of the linear function is too 
gradual, and it gives relatively high values even when there is a sufficient difference in the 
ratings. For example, when x = 0.5, f{x) = 0.5 and if the previous trust between the users 
was less than this value, then this increases the trust between them. This is however, not 
very practical as such a large difference in rating indicates a significant difference of opinion. 
We need a function that gives a high trust value when x is close to but decreases rapidly 
after that. It is evident from this that a convex function would be appropriate for our needs. 
A general convex function that (asymptotically) fits our needs is 

m = (20) 
1 + px 

The larger the value of p, the more convex the function is. Moreover, asp— t-oo, /(I) — >0. 
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Figure 6: Function plotted for three different values of p 



The function has been plotted for different values of p in Figure [6] The value of p can be 
chosen based on parameters like the total size of the community, the definition of trust as is 
appropriate for the task at hand and the value of 7 that is to be considered. 

The advantage of using such a model is that one can build the whole trust matrix based 
solely on the revision history of the article. Such a model, when implemented centrally also 
provides some incentive against vandalism and manipulation. Firstly, users who deliberately 
provide the article an incorrect rating to further their own motives incur the wrath of low 
trust from other users. Moreover, any algorithm that selects raters based on trust such as 
the one described in Section 5^ would never select such a user due to his poor impact on 
other collaborators. Secondly, it is possible for a user to manipulate the system by providing 
a rating similar to that of other users in order to increase his trustworthiness. This is avoided 
by keeping ratings private thereby encouraging good behavior. 
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